AITopics

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Industry: Education > Educational Setting > Online (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Neural Information Processing SystemsFeb-11-2026, 00:26:43 GMT

Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation

Generative AI and large language models hold great promise in enhancing programming education by generating individualized feedback and hints for learners. Recent works have primarily focused on improving the quality of generated feedback to achieve human tutors' quality.

large language model, machine learning, natural language, (18 more...)

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.69)
Education > Curriculum > Subject-Specific Education (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

Neural Information Processing SystemsFeb-11-2026, 00:26:40 GMT

34cc2ded6daba59357134c0b9fb06bfe-Paper-Datasets_and_Benchmarks_Track.pdf

buggy program, large language model, machine learning, (18 more...)

Country: Asia > Singapore (0.04)

Genre:

Research Report (0.68)
Workflow (0.49)

Industry:

Law (0.68)
Information Technology > Security & Privacy (0.48)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Silva, André, Thorén, Gustav, Monperrus, Martin

Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

arXiv.org Artificial IntelligenceDec-1-2025

Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens. This symbolic search is fundamentally limited by its inability to directly reason about program behavior. We introduce Gradient-Based Program Repair (GBPR), a new paradigm that reframes program repair as continuous optimization in a differentiable numerical program space. Our core insight is to compile symbolic programs into differentiable numerical representations, enabling search in the numerical program space directly guided by program behavior. To evaluate GBPR, we present RaspBugs, a new benchmark of 1,466 buggy symbolic RASP programs and their respective numerical representations. Our experiments demonstrate that GBPR can effectively repair buggy symbolic programs by gradient-based optimization in the numerical program space, with convincing repair trajectories. To our knowledge, we are the first to state program repair as continuous optimization in a numerical program space. Our work establishes a new direction for program repair research, bridging two rich worlds: continuous optimization and program behavior.

evolutionary algorithm, gradient-based program repair, machine learning, (16 more...)

2505.17703

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

arXiv.org Artificial IntelligenceNov-4-2025

HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning

Liu, Yujian, Ji, Jiabao, Zhang, Yang, Guo, Wenbo, Jaakkola, Tommi, Chang, Shiyu

Existing LLM-based automatic test generation methods mainly produce input and expected output pairs to categorize the intended behavior of correct programs. Although straightforward, these methods have limited diversity in generated tests and cannot provide enough debugging information. We propose HarnessLLM, a two-stage training pipeline that enables LLMs to write harness code for testing. Particularly, LLMs generate code that synthesizes inputs and validates the observed outputs, allowing complex test cases and flexible output validation such as invariant checking. To achieve this, we train LLMs with SFT followed by RLVR with a customized reward design. Experiments show that HarnessLLM outperforms input-output-based testing in bug finding and testing strategy diversity. HarnessLLM further benefits the code generation performance through test-time scaling with our generated test cases as inference-phase validation. Our code is available at https://github.com/UCSB-NLP-Chang/HarnessLLM.git.

large language model, machine learning, natural language, (19 more...)

2511.01104

Country: North America (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsOct-9-2025, 23:02:44 GMT

34cc2ded6daba59357134c0b9fb06bfe-Supplemental-Datasets_and_Benchmarks_Track.pdf

buggy program, dataset, learner, (13 more...)

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.69)
Government (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications (0.94)
Information Technology > Software (0.93)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 23:02:40 GMT

Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation

buggy program, inference time, learner, (13 more...)

Country: Asia > Singapore (0.04)

Genre:

Research Report (0.68)
Workflow (0.49)

Industry:

Information Technology > Security & Privacy (0.48)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Educational Setting (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Rahul Gupta, Aditya Kanade, Shirish Shevade

Neural Attribution for Semantic Bug-Localization in Student Programs

Neural Information Processing SystemsAug-20-2025, 09:02:35 GMT

Providing feedback is an integral part of teaching. Most open online courses on programming make use of automated grading systems to support programming assignments and give real-time feedback.

baseline, buggy program, neural network, (16 more...)

Country:

North America > United States (0.04)
North America > Canada (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre:

Instructional Material (0.68)
Research Report (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.54)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Orvalho, Pedro, Janota, Mikoláš, Manquinho, Vasco

Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization

arXiv.org Artificial IntelligenceDec-19-2024

Automated Program Repair (APR) for introductory programming assignments (IPAs) is motivated by the large number of student enrollments in programming courses each year. Since providing feedback on IPAs requires substantial time and effort from faculty, personalized feedback often involves suggesting fixes to students' programs. Formal Methods (FM)-based semantic repair approaches, check a program's execution against a test suite or reference solution, are effective but limited. These tools excel at identifying buggy parts but can only fix programs if the correct implementation and the faulty one share the same control flow graph. Conversely, Large Language Models (LLMs) are used for APR but often make extensive instead of minimal rewrites. This leads to more invasive fixes, making it harder for students to learn from their mistakes. In summary, LLMs excel at completing strings, while FM-based fault localization excel at identifying buggy parts of a program. In this paper, we propose a novel approach that combines the strengths of both FM-based fault localization and LLMs, via zero-shot learning, to enhance APR for IPAs. Our method uses MaxSAT-based fault localization to identify buggy parts of a program, then presents the LLM with a program sketch devoid of these buggy statements. This hybrid approach follows a CEGIS loop to iteratively refine the program. We ask the LLM to synthesize the missing parts, which are then checked against a test suite. If the suggested program is incorrect, a counterexample from the test suite is fed back to the LLM. Our experiments show that our counterexample guided approach, using MaxSAT-based bug-free program sketches, significantly improves the repair capabilities of all six evaluated LLMs. This method allows LLMs to repair more programs with smaller fixes, outperforming other configurations and state-of-the-art symbolic program repair tools.

large language model, machine learning, natural language, (19 more...)

2502.07786

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(16 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Song, Jialin, Raiman, Jonathan, Catanzaro, Bryan

Effective Large Language Model Debugging with Best-first Tree Search

arXiv.org Artificial IntelligenceJul-26-2024

However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental difference with how an LLM writes code, compared to a human programmer, is that it cannot consistently spot and fix bugs. Debugging is a crucial skill for programmers and it enables iterative code refinement towards a correct implementation. In this work, we propose a novel algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes. Our key contributions are 1) a best-first tree search algorithm with self-reflections (BESTER) that achieves state-of-the-art Pass@1 in three code generation benchmarks. BESTER maintains its superiority when we measure pass rates taking into account additional inference costs incurred by tree search.

arxiv preprint arxiv, attribution score, language model, (13 more...)

2407.19055

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)